posterior regularization
Kernel Bayesian Inference with Posterior Regularization
We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The authors present an intriguing take on factor analysis: namely, the use of Ganchev et al's posterior regularization to enforce non negativity constraints on the posterior. They present proofs of convergence and correctness, and present a scalable method for inference and learning in stacked constrained factor analysis models. The method appears to be sound and is an interesting direction overall, a refreshing departure from much of the existing literature, and the first instance of which I am aware that posterior regularization has been highlighted in a deep learning/unsupervised feature learning context. While I did not review it in detail, the degree of thoroughness demonstrated by the supplementary material is truly impressive. My main concern with this paper is one of being somewhat underwhelmed by the empirical evaluation, in light of the norms of the community.
Reviews: Kernel Bayesian Inference with Posterior Regularization
This paper provides an interesting connection between kernel Bayesian inference and vector valued regression. Based on this, a new regularization method is provided to compute an approximation of the kernel embedding of the posterior distribution. Simulation results look promising, suggesting that the new method gains improvement over many existing methods. However, as a non expert, from reading the current introduction, I'm still confused about the motivation of using kernel Bayesian inference---in order to approximate the kernel embedding of the posterior, a sample of iid draws (x_i, y_i) from the joint distribution of the parameter/hidden variable (X in the paper) and data (Y in the paper) are assumed to be available. First, it is a highly non-trivial problem of obtaining samples (x_i)'s from the posterior.
Reviews: Deep Generative Models with Learnable Knowledge Constraints
Summary: The paper proposes a way to incorporate constraints into the learning of generative models through posterior regularization. In doing so, the paper draws connections between posterior regularization and policy optimization. One of the key contributions of this paper is that the constraints are modeled as extrinsic rewards and learned through inverse reinforcement learning. The paper studies an interesting and very practical problem and the contributions are substantial. The writing could definitely be made clearer for Sections 3 and 4, where the overloaded notation is often hard to follow. I have the following questions: 1.
- Information Technology > Artificial Intelligence > Natural Language > Generation (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.40)
Reviews: Context Selection for Embedding Models
The authors propose an extension to the Exponential Family Embeddings (EFE) model for producing low dimensional representations of graph data based on its context (EFE extends word2vec-style word embedding models to other data types such as counts or real number by using embedding-context scores to produce the natural parameters of various exponential family distributions). They note that while context-based embedding models have been extensively researched, some contexts are more relevant than others for predicting a given target and informing its embedding. This observation has been made for word embeddings in prior work, with [1] using a learned attention mechanism to form a weighted average of predictive token contexts and [2] learning part-of-speech-specific classifiers to produce context weights. Citations to this related work should be added to the paper. There has also been prior work that learns fixed position-dependent weights for each word embedding context, but I am not able to recall the exact citation.
Infinite Latent SVM for Classification and Multi-task Learning, and Eric P. Xing Dept. of Computer Science & Tech., TNList Lab, Tsinghua University, Beijing 100084, China
Unlike existing nonparametric Bayesian models, which rely solely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations, we study nonparametric Bayesian inference with regularization on the desired posterior distributions. While priors can indirectly affect posterior distributions through Bayes' theorem, imposing posterior regularization is arguably more direct and in some cases can be much easier. We particularly focus on developing infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the largemargin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets. Our results appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics.
- Asia > China > Beijing > Beijing (0.40)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Kernel Bayesian Inference with Posterior Regularization Dept. of Physics, Tsinghua University, Beijing, China
We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.
- Asia > China > Beijing > Beijing (0.40)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)